home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
World of Video
/
World of Video.iso
/
datafiles
/
gfx_formats
/
gif
/
strtbl.txt
< prev
next >
Wrap
Text File
|
1995-02-13
|
4KB
|
72 lines
Non-Recursive String Table for LZW Decompressors - John Jeppson
The standard string table format, as suggested in "A Technique for High-
Performance Data Compression" by Terry A. Welch, is storage elements of the
form [prefix]K where [prefix] is a reference to an earlier string in the
table. Character output then consists of looking up a string table entry and
generating the output string by recursive decomposition. The characters come
off one at a time in reverse order, so are generally pushed on the stack and
copied off as a string, thereby correcting the order.
The [prefix]K format is particularly well suited to compressors since it
facilitates hashing methods for searching the table. Decompressors, on the
other hand, never search the table so during decompression this format
may not yield optimum performance. This file discusses an alternative
strategy for decompressors.
The trade-off is memory for speed. Computationally intensive recursive
decomposition is not required, but the entire output charstream must be
retained in memory as an array of bytes. The string table is implemented as
a table of pointers (and counts) to "earlier" strings already present in the
output stream. Output for each received code is then a simple matter of
copying the string from an earlier site to the present site.
If 'N' is the number of bits/pixel then the first 2**N strings in the string
table consist of a single character. These single-character strings are
the "root" codes and when first encountered do NOT previously exist in the
output stream. All subsequent strings are at least two characters in length
and do exist in the output charstream.
The string table should consist, therefore, of a set of 4096 records each
consisting of a pointer and an integer count. (The longest possible string
is less than 4096 characters, so a 16-bit count will suffice.) The first 2**N
string table entries should be pointers to a separate static string array,
and each count should be 1. This static string array can just be a
pre-initialized array [00,01,02...FE,FF] of byte. Whenever a clear code is
encountered the string table should be cleared and re-initialized to these
root-string pointers.
During decompression every code value received requires (1) a write of one or
more characters to the output stream, and (2) a new entry in the string table
(except for the first code). The destination address and count of the
current write and of the previous write are retained and updated as
decompression proceeds.
When a code is received there are two cases:
1. value(code) already in string table.
(a) copy value(code) to the output charstream at curAddress.
(b) install previousAddress/previousCount+1 as the next entry in the
string table. Note that by incrementing the count, the new string entry will
"overlap" onto the first character of the curAddress, which is just what we
want (i.e. [prefix]K ).
2. value(code) not yet in string table.
In this case we copy [prefix]K to the output charstream at curAddress.
Here [prefix] is the string at previousAddress/previousCount (from the
previous write operation) and K is the first character of that same string
(i.e., previousAddress[1]). We then add curAddress/curCount as the next
entry in the string table.
In every case we are simply copying a string of characters which should be
much faster than recursive decomposition. It won't work, however, if the
output charstream is not retained in memory. It also won't work, or at least
not gracefully, if the output charstream is maintained as small bitfields or
is broken up by intervening fill. This might be true, for example, if the
output stream is sent directly to screen memory. In those circumstances
recursive decomposition is the way to go.
With regard to the "root" string pointers, beware of relocation problems.
These absolute addresses should be loaded dynamically whenever the string
table is reset, they should never be hard coded into the application.